Chapter 23
Survival Regression
IN THIS CHAPTER
Knowing when to use survival regression
Grasping the concepts behind survival regression
Running and interpreting the outcome of survival regression
Peeking at prognosis curves
Estimating sample size for survival regression
Survival regression is one of the most commonly used techniques in biostatistics. It overcomes the
limitations of the log-rank test (see Chapter 22) and allows you to analyze how survival time is
influenced by one or more predictors (the X variables), which can be categorical or numerical. In this
chapter, we introduce survival regression. We specify when to use it, describe its basic concepts, and
show you how to run survival regressions in statistical software and interpret the output. We also
explain how to build prognosis curves and estimate the sample size you need to support a survival
regression.
Note: Because time-to-event data so often describe actual survival, when the event we are talking
about is death, we use the terms death and survival time. But everything we say about death applies to
the first occurrence of any event, like pre-diabetes patients restoring their blood sugar to normal
levels, or cancer survivors suffering a recurrence of cancer.
Knowing When to Use Survival Regression
In Chapter 21, we examine the special problems that come up when the researcher can’t continue to
collect data during follow-up on a participant long enough to observe whether or not they ever
experience the event being studied. To recap, in this situation, you should censor the data. This means
you should acknowledge the participant was only observed for a limited amount of time, and then was
lost to follow-up. In that chapter, we also explain how to summarize survival data using life tables and
the Kaplan-Meier method, and how to graph time-to-event data as survival curves. In Chapter 22, we
describe the log-rank test, which you can use to compare survival among a small number of groups —
for example, participants taking drug versus placebo, or participants initially diagnosed at four
different stages of the same cancer.
But the log-rank test has limitations:
The log-rank test doesn’t handle numerical predictors well. Because this test compares survival
among a small number of categories, it does not work well for a numerical variable like age. To
compare survival among different age groups with the log-rank test, you would first have to
categorize the participants into age ranges. The age ranges you choose for your groups should be